FlipDial: A Generative Model for Two-Way Visual Dialogue
نویسندگان
چکیده
We present FLIPDIAL, a generative model for Visual Dialogue that simultaneously plays the role of both participants in a visually-grounded dialogue. Given context in the form of an image and an associated caption summarising the contents of the image, FLIPDIAL learns both to answer questions and put forward questions, capable of generating entire sequences of dialogue (question-answer pairs) which are diverse and relevant to the image. To do this, FLIPDIAL relies on a simple but surprisingly powerful idea: it uses convolutional neural networks (CNNs) to encode entire dialogues directly, implicitly capturing dialogue context, and conditional VAEs to learn the generative model. FLIPDIAL outperforms the state-of-the-art model in the sequential answering task (1VD) on the VisDial dataset by 5 points in Mean Rank using the generated answers. We are the first to extend this paradigm to full two-way visual dialogue (2VD), where our model is capable of generating both questions and answers in sequence based on a visual input, for which we propose a set of novel evaluation measures and metrics.
منابع مشابه
Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning
The Visual Dialogue task requires an agent to engage in a conversation about an image with a human. It represents an extension of the Visual Question Answering task in that the agent needs to answer a question about an image, but it needs to do so in light of the previous dialogue that has taken place. The key challenge in Visual Dialogue is thus maintaining a consistent, and natural dialogue w...
متن کاملComparison of Bayesian Discriminative and Generative Models for Dialogue State Tracking
In this paper, we describe two dialogue state tracking models competing in the 2012 Dialogue State Tracking Challenge (DSTC). First, we detail a novel discriminative dialogue state tracker which directly estimates slot-level beliefs using deterministic state transition probability distribution. Second, we present a generative model employing a simple dependency structure to achieve fast inferen...
متن کاملAutomatic Colorization of Grayscale Images Using Generative Adversarial Networks
Automatic colorization of gray scale images poses a unique challenge in Information Retrieval. The goal of this field is to colorize images which have lost some color channels (such as the RGB channels or the AB channels in the LAB color space) while only having the brightness channel available, which is usually the case in a vast array of old photos and portraits. Having the ability to coloriz...
متن کاملDialogue patterns - A visual language for dynamic dialogue
A dynamic dialogue is a conversation in which each participant alternately selects remarks based on a changing world state and in which each remark can change the world state. Dynamic dialogues happen frequently as conversations between a player character (PC) and a non-player character (NPC) in a computer game. When it is the PC’s turn to speak, the current game state is used to filter the sta...
متن کاملBuilding End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models. Generative models produce system responses that are autonomously generated word-by-word, opening up the possibility for realistic, flexible interactions. In support of this goal, we extend the recently proposed hierarchical recurrent encoder-decoder neural ne...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.03803 شماره
صفحات -
تاریخ انتشار 2018